智能论文笔记

An Interpretable Federated Learning-based Network Intrusion Detection Framework

Tian Dong , Song Li , Han Qiu , Jialiang Lu

分类：机器学习

2022-01-10

基于学习的网络入侵检测系统（NIDS）被广泛部署用于捍卫各种网络攻击。现有的基于学习的NID主要使用神经网络（NN）作为依赖于网络图克数据的质量和数量的分类器。这种基于NN的方法也很难解释提高效率和可扩展性。在本文中，我们通过组合可解释的梯度升压决策树（GBDT）和联合学习（FL）框架来设计一个新的本地全局计算范例，基于新的学习的NID。具体地，联合纤维公司由多个客户端组成，该客户端提取用于服务器的本地网络基地数据功能以培训模型和检测入侵。在Fedlorest中还提出了一种隐私增强技术，以进一步击败流动系统的隐私。关于4个网络内人数据集的广泛实验，不同任务表明，联邦纤维公司是有效，高效，可解释和可延伸的。 Fedlorest在中国大学生的协同学习和网络安全竞赛中排名第一。

translated by 谷歌翻译

TDGIA:Effective Injection Attacks on Graph Neural Networks

Xu Zou , Qinkai Zheng , Yuxiao Dong , Xinyu Guan , Evgeny Kharlamov , Jialiang Lu , Jie Tang

分类：机器学习

2021-06-12

图形神经网络（GNNS）在各种现实世界应用中取得了有希望的性能。然而，最近的研究表明，GNN易受对抗性发作的影响。在本文中，我们研究了关于图表 - 图 - 图注射攻击（GIA）的最近引入的现实攻击情景。在GIA场景中，对手无法修改输入图的现有链路结构和节点属性，而是通过将逆势节点注入到它中来执行攻击。我们对GIA环境下GNN的拓扑脆弱性分析，基于该拓扑结构，我们提出了用于有效注射攻击的拓扑缺陷图注射攻击（TDGIA）。 TDGIA首先介绍了拓扑有缺陷的边缘选择策略，可以选择与注入的原始节点连接。然后，它设计平滑功能优化目标，以生成注入节点的功能。大规模数据集的广泛实验表明，TDGIA可以一致而明显优于攻击数十个防御GNN模型中的各种攻击基线。值得注意的是，来自TDGIA的目标GNNS上的性能下降比KDD-CUP 2020上的数百个提交所带来的最佳攻击解决方案所带来的损坏多于两倍。

translated by 谷歌翻译

MOPRD: A multidisciplinary open peer review dataset

Jialiang Lin , Jiaxin Song , Zhangping Zhou , Yidong Chen , Xiaodong Shi

分类：人工智能 | 自然语言处理 | 机器学习

2022-12-09

Open peer review is a growing trend in academic publications. Public access to peer review data can benefit both the academic and publishing communities. It also serves as a great support to studies on review comment generation and further to the realization of automated scholarly paper review. However, most of the existing peer review datasets do not provide data that cover the whole peer review process. Apart from this, their data are not diversified enough as they are mainly collected from the field of computer science. These two drawbacks of the currently available peer review datasets need to be addressed to unlock more opportunities for related studies. In response to this problem, we construct MOPRD, a multidisciplinary open peer review dataset. This dataset consists of paper metadata, multiple version manuscripts, review comments, meta-reviews, author's rebuttal letters, and editorial decisions. Moreover, we design a modular guided review comment generation method based on MOPRD. Experiments show that our method delivers better performance indicated by both automatic metrics and human evaluation. We also explore other potential applications of MOPRD, including meta-review generation, editorial decision prediction, author rebuttal generation, and scientometric analysis. MOPRD is a strong endorsement for further studies in peer review-related research and other applications.

translated by 谷歌翻译

LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training

Hongwei Han , Jialiang Xu , Mengyu Zhou , Yijia Shao , Shi Han , Dongmei Zhang

分类：自然语言处理

2022-12-06

Transformers are widely used in NLP tasks. However, current approaches to leveraging transformers to understand language expose one weak spot: Number understanding. In some scenarios, numbers frequently occur, especially in semi-structured data like tables. But current approaches to rich-number tasks with transformer-based language models abandon or lose some of the numeracy information - e.g., breaking numbers into sub-word tokens - which leads to many number-related errors. In this paper, we propose the LUNA framework which improves the numerical reasoning and calculation capabilities of transformer-based language models. With the number plugin of NumTok and NumBed, LUNA represents each number as a whole to model input. With number pre-training, including regression loss and model distillation, LUNA bridges the gap between number and vocabulary embeddings. To the best of our knowledge, this is the first work that explicitly injects numeracy capability into language models using Number Plugins. Besides evaluating toy models on toy tasks, we evaluate LUNA on three large-scale transformer models (RoBERTa, BERT, TabBERT) over three different downstream tasks (TATQA, TabFact, CrediTrans), and observe the performances of language models are constantly improved by LUNA. The augmented models also improve the official baseline of TAT-QA (EM: 50.15 -> 59.58) and achieve SOTA performance on CrediTrans (F1 = 86.17).

translated by 谷歌翻译

Differential Evolution based Dual Adversarial Camouflage: Fooling Human Eyes and Object Detectors

Jialiang Sun , Tingsong Jiang , Wen Yao , Donghua Wang , Xiaoqian Chen

分类：计算机视觉 | 人工智能

2022-10-17

Recent studies reveal that deep neural network (DNN) based object detectors are vulnerable to adversarial attacks in the form of adding the perturbation to the images, leading to the wrong output of object detectors. Most current existing works focus on generating perturbed images, also called adversarial examples, to fool object detectors. Though the generated adversarial examples themselves can remain a certain naturalness, most of them can still be easily observed by human eyes, which limits their further application in the real world. To alleviate this problem, we propose a differential evolution based dual adversarial camouflage (DE_DAC) method, composed of two stages to fool human eyes and object detectors simultaneously. Specifically, we try to obtain the camouflage texture, which can be rendered over the surface of the object. In the first stage, we optimize the global texture to minimize the discrepancy between the rendered object and the scene images, making human eyes difficult to distinguish. In the second stage, we design three loss functions to optimize the local texture, making object detectors ineffective. In addition, we introduce the differential evolution algorithm to search for the near-optimal areas of the object to attack, improving the adversarial performance under certain attack area limitations. Besides, we also study the performance of adaptive DE_DAC, which can be adapted to the environment. Experiments show that our proposed method could obtain a good trade-off between the fooling human eyes and object detectors under multiple specific scenes and objects.

translated by 谷歌翻译

FedBA: Non-IID Federated Learning Framework in UAV Networks

Pei Li , Zhijun Liu , Luyi Chang , Jialiang Peng , Yi Wu

分类：机器学习

2022-10-10

With the development and progress of science and technology, the Internet of Things(IoT) has gradually entered people's lives, bringing great convenience to our lives and improving people's work efficiency. Specifically, the IoT can replace humans in jobs that they cannot perform. As a new type of IoT vehicle, the current status and trend of research on Unmanned Aerial Vehicle(UAV) is gratifying, and the development prospect is very promising. However, privacy and communication are still very serious issues in drone applications. This is because most drones still use centralized cloud-based data processing, which may lead to leakage of data collected by drones. At the same time, the large amount of data collected by drones may incur greater communication overhead when transferred to the cloud. Federated learning as a means of privacy protection can effectively solve the above two problems. However, federated learning when applied to UAV networks also needs to consider the heterogeneity of data, which is caused by regional differences in UAV regulation. In response, this paper proposes a new algorithm FedBA to optimize the global model and solves the data heterogeneity problem. In addition, we apply the algorithm to some real datasets, and the experimental results show that the algorithm outperforms other algorithms and improves the accuracy of the local model for UAVs.

translated by 谷歌翻译

Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers

Jialiang Lin , Yingmin Wang , Yao Yu , Yu Zhou , Yidong Chen , Xiaodong Shi

分类：人工智能 | 自然语言处理 | 机器学习

2022-09-28

源代码对于研究人员重现方法并复制人工智能（AI）论文的结果至关重要。一些组织和研究人员手动收集具有可用源代码的AI论文，以对AI社区做出贡献。但是，手动收集是一项劳动密集型且耗时的任务。为了解决此问题，我们提出了一种方法，可以自动识别具有可用源代码的论文并提取其源代码存储库URL。通过这种方法，我们发现，从2010年到2019年发布的10个最高AI会议的常规论文中有20.5％被确定为具有可用源代码的论文，并且这些源代码存储库中有8.1％不再可访问。我们还创建了XMU NLP Lab ReadMe数据集，这是用于源代码文档研究的标记已读数文件的最大数据集。通过此数据集，我们发现了很多读书文件没有提供的安装说明或使用教程。此外，对AI会议论文的源代码的一般图片进行了大规模的综合统计分析。提出的解决方案还可以超越AI会议论文，以分析来自期刊和会议的其他科学论文，以阐明更多领域。

translated by 谷歌翻译

Learning to Evaluate Performance of Multi-modal Semantic Localization

Zhiqiang Yuan , Wenkai Zhang , Chongyang Li , Zhaoying Pan , Yongqiang Mao , Jialiang Chen , Shouke Li , Hongqi Wang , Xian Sun

分类：计算机视觉

2022-09-14

语义本地化（SELO）是指使用语义信息（例如文本）在大规模遥感（RS）图像中获得最相关位置的任务。作为基于跨模式检索的新兴任务，Selo仅使用字幕级注释来实现语义级检索，这表明了其在统一下游任务方面的巨大潜力。尽管Selo已连续执行，但目前没有系统地探索并分析了这一紧急方向。在本文中，我们彻底研究了这一领域，并根据指标和测试数据提供了完整的基准，以推进SELO任务。首先，基于此任务的特征，我们提出了多个判别评估指标来量化SELO任务的性能。设计的显着面积比例，注意力转移距离和离散的注意距离可用于评估从像素级别和区域级别中产生的SELO图。接下来，为了为SELO任务提供标准评估数据，我们为多样化的，多语义的，多目标语义定位测试集（AIR-SLT）贡献。 AIR-SLT由22个大型RS图像和59个具有不同语义的测试用例组成，旨在为检索模型提供全面的评估。最后，我们详细分析了RS跨模式检索模型的SELO性能，探索不同变量对此任务的影响，并为SELO任务提供了完整的基准测试。我们还建立了一个新的范式来引用RS表达理解，并通过将其与检测和道路提取等任务相结合，证明了Selo在语义中的巨大优势。拟议的评估指标，语义本地化测试集和相应的脚本已在github.com/xiaoyuan1996/semanticlocalizationmetrics上访问。

translated by 谷歌翻译

Inferring Tabular Analysis Metadata by Infusing Distribution and Knowledge Information

Xinyi He , Mengyu Zhou , Jialiang Xu , Xiao Lv , Tianle Li , Yijia Shao , Shi Han , Zejian Yuan , Dongmei Zhang

分类：机器学习

2022-09-02

许多数据分析任务在很大程度上依赖对表的深入了解（多维数据）。在整个任务中，都存在表字段 /列的共同使用的元数据属性。在本文中，我们确定了四个这样的分析元数据：测量/维度二分法，公共场作用，语义场类型和默认聚集函数。尽管这些元数据面临不足的监督信号的挑战，利用现有的知识和理解分布。为了将这些元数据推理为原始表，我们提出了多任务元数据模型，该模型将现场分布和知识图信息融合到预训练的表格模型中。对于模型培训和评估，我们通过使用下游任务的各种智能监督来收集分析元数据的大型语料库（来自私人电子表格和公共表格数据集的〜582K表）。我们的最佳模型的精度= 98％，命中率在TOP-1> 67％，精度> 80％和四个分析元数据推理任务的精度= 88％。它的表现优于基于规则，传统机器学习方法和预训练的表格模型的一系列基线。分析元数据模型被部署在流行的数据分析产品中，帮助下游智能功能，例如Insights挖掘，图表 /枢轴表建议和自然语言QA ...

translated by 谷歌翻译

HTML版本

Few-Shot Learning of Accurate Folding Landscape for Protein Structure Prediction

Jun Zhang , Sirui Liu , Mengyun Chen , Haotian Chu , Min Wang , Zidong Wang , Jialiang Yu , Ningxi Ni , Fan Yu , Diqing Chen

分类：机器学习 | 人工智能

2022-08-20

数据驱动的预测方法可以有效，准确地将蛋白质序列转化为生物活性结构，对于科学研究和治疗发展非常有价值。使用共同进化信息确定准确的折叠格局是现代蛋白质结构预测方法的成功基础。作为最新的状态，AlphaFold2显着提高了准确性，而无需进行明确的共同进化分析。然而，其性能仍然显示出对可用序列同源物的强烈依赖。我们研究了这种依赖性的原因，并提出了一种元生成模型Evogen，以弥补较差的MSA靶标的Alphafold2的表现不佳。 Evogen使我们能够通过降低搜索的MSA或生成虚拟MSA来操纵折叠景观，并帮助Alphafold2在低数据表方面准确地折叠，甚至通过单序预测来实现令人鼓舞的性能。能够用很少的MSA做出准确的预测，不仅可以更好地概括为孤儿序列的Alphafold2，而且使其在高通量应用程序中的使用民主化。此外，Evogen与AlphaFold2结合产生了一种概率结构生成方法，该方法可以探索蛋白质序列的替代构象，并且序列生成的任务意识可区分算法将使包括蛋白质设计在内的其他相关任务受益。

translated by 谷歌翻译